Building a Linguistically Interpreted Corpus of Bulgarian: the BulTreeBank

نویسندگان

Kiril Ivanov Simov

Petya Osenova

Milena Slavcheva

Sia Kolkovska

Elisaveta Balabanova

Dimitar Doikoff

Krassimira Ivanova

Alexander Simov

Milen Kouylekov

چکیده

In the field of Human Language Technology (HLT), the existence of linguistically interpreted real-world texts provides the license necessary for a given language to enter the area of high-tech applications. The significance of BulTreeBank is the granting of an HLT license to a “less processed” language like Bulgarian which, until recently, has been formally modelled and processed mainly on the morphology level. The BulTreeBank project aims at the creation of syntactically annotated data for Bulgarian and the tools for their production, management and automatic processing. It provides not only language resources, but develops an infrastructure of research solutions, production scenarios and services.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Segmentation Layers in the Group of the Predicate: a Case Study of Bulgarian within the BulTreeBank Framework∗

This paper describes the development of a regular grammar that automatically recognizes and delimits segments in the group of the predicate in sentences of Bulgarian. The language-specific segmentation is performed at the level of partial parsing where reliable, meaningful and useful entities are formed called chunks. The significance of the grammar development lies in the fact that it is a plu...

متن کامل

Challenges Behind the Data-driven Bulgarian WordNet (BulTreeBank Bulgarian Wordnet)

The paper presents our work towards the simultaneous creation of a data-driven WordNet for Bulgarian and a manually annotated treebank with semantic information. Such an approach requires synchronization of the word senses in both syntactic and lexical resources, without limiting the WordNet senses to the corpus or vice versa. Our strategy focuses on the identification of senses used in BulTree...

متن کامل

A Data-Driven Dependency Parser for Bulgarian

One of the main motivations for building treebanks is that they facilitate the development of syntactic parsers, by providing realistic data for evaluation as well as inductive learning. In this paper we present what we believe to be the first robust data-driven parser for Bulgarian, trained and evaluated on data from BulTreeBank (Simov et al., 2002). The parser uses dependency-based representa...

متن کامل

A Treebank-driven Creation of an OntoValence Verb lexicon for Bulgarian

The paper presents a treebank-driven approach to the construction of a Bulgarian valence lexicon with ontological restrictions over the inner participants of the event. First, the underlying ideas behind the Bulgarian Ontology-based lexicon are outlined. Then, the extraction and manipulation of the valence frames is discussed with respect to the BulTreeBank annotation scheme and DOLCE ontology....

متن کامل

A Publicly Available Cross-Platform Lemmatizer for Bulgarian

Our dictionary-based lemmatizer for the Bulgarian language presented here is distributed as free software, publicly available to download and use under the GPL v3 license. The presented software is written entirely in Java and is distributed as a GATE plugin. To our best knowledge, at the time of writing this article, there are not any other free lemmatization tools specifically targeting the B...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2002

Building a Linguistically Interpreted Corpus of Bulgarian: the BulTreeBank

نویسندگان

چکیده

منابع مشابه

Segmentation Layers in the Group of the Predicate: a Case Study of Bulgarian within the BulTreeBank Framework∗

Challenges Behind the Data-driven Bulgarian WordNet (BulTreeBank Bulgarian Wordnet)

A Data-Driven Dependency Parser for Bulgarian

A Treebank-driven Creation of an OntoValence Verb lexicon for Bulgarian

A Publicly Available Cross-Platform Lemmatizer for Bulgarian

عنوان ژورنال:

اشتراک گذاری